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(54) Image information describing method, video retrieval method, video reproducing method, 
and video reproducing apparatus 

(57) Video frames of original video data are sam- 
pled with arbitrary time interval and size, and thumbnail 
frames are obtained. As thumbnail information concern- 
ing these frames, information on frame number of the 
original video frame corresponding each of the thumb- 
nail frames and size of each thumbnail frame are 
described. Further, scene change information on the 
original video frames or irttra-frame frame change value 
information are described altogether as additional infor- 
mation, and temporal/spatial thumbnail meta-data is 
obtained. The meta-data is associated with original 
video data, and a database is constructed. Then, the 
meta-data is employed, thereby performing typical 
frame display of original video data or variable speed 
reproduction. In this manner, even with a device with its 
low CPU capability, typical frame display or variable 
speed reproduction is performed for compressed and 
encoded video data such as MPEG-2, and the contents 
of video is checked, and retrieval is easily performed. 
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Description 

[0001 ] The present invention relates to a method of 
describing image information. In particular, the present 
invention relates to a method of describing thumbnail 
information on thumbnail frames which are obtained by 
sampling video frames with arbitrary time intervals and 
in spatially arbitrary size, and video retrieval and video 
reproducing methods and apparatus employing the 
thumbnail information. 

[0002] In recent years, with advancement of semi- 
conductor technology and digital signal processing 
technology, it has been possible to perform processing 
for converting moving image (video) information from 
analog data to digital data and compressing the digital 
data in real time. Actually, in digital satellite broadcast- 
ing, digital video data compressed and encoded by 
MPEG-2 that is the international standard for moving 
picture image compression is distributed, and the com- 
pressed video data is decompressed and decoded at 
each home in real time so that cinemas or the like can 
be watched by means of a television receiver. 
[0003] In addition, with achievement of a high-den- 
sity optical disk, a technology for recording digital video 
data compressed by MPEG-2 or the like is becoming at 
a practical stage. A typical example of such optical disk 
medium includes DVD-RAM or CD-RW. Although a 
recording time is shorter than that of DVD-RAM, it is 
also possible to record digital video data in HDD. Fur- 
ther, it is considered that digital video data recorded in 
the DVD-RAM or the like is required so as to easily 
retrieve as in digitized texts or still picture data. 
[0004] A classical technique for video retrieval is 
that a title name and a keyword are defined for each 
video file such as cinema, and are retrieved based on 
one or both of the title name and keyword. This method 
is disadvantage in that retrieval itself is easy, but 
detailed retrieval according to the content of a video 
cannot be done, and whether or not a predetermined 
video is obtained cannot be identified as long as the 
video is actually reproduced and displayed. 
[0005] In recording the compressed digital video 
data, the moving video image can be bandied as conti- 
nuity of still image frames. Thus, a method for selecting 
and listing a characteristic image frame called a typical 
frame from an original video by means of an image 
processing technology is considered. As a typical 
frame, a frame called scene change at which a scene is 
switched may be often employed. However, such scene 
change does not occur once per several seconds, occa- 
sionally once per some tens of seconds, and thus, there 
is a limitation to expressing the content of the video at 
the typical frame. If an attempt is made to check con- 
tents of frames between scene changes, the original 
video data must be decoded and displayed. 
[0006] The digital video data compressed in 
accordance with the international standards such as 
MPEG-1 and MPEG-2 includes a mechanism for ran- 



dom access to a certain extent so that variable speed 
reproduction (trick play) such as fast reproduction can 
be performed. However, these variable speed reproduc- 
tions are heavy in processing because they are per- 

5 formed by manipulating digital video data itself, and a 
burden upon processing is increased in a receiving 
device for home use with its small computer power. In 
addition, as in browser in video on demand or Internet, 
when variable speed reproduction is performed in envi- 

io ronment such that digital video data is distributed from a 
server installed at a remote site through a network, and 
the distributed data is received by a computer or a tele- 
vision receiver at home, there is difficulty that a network 
traffic is increased. 

is [0007] As described above, a conventionally gen- 
eral video retrieval is such that information is retrieved 
based on a title name or a keyword assigned to a video 
file, and in reality, environment in which the content of a 
video is checked and retrieved is not sufficiently pro- 

20 vided. 

[0008] In addition, there is a problem that a method 
for selecting a portion of scene change from an original 
video as a typical frame and listing the typical frame is 
incapable of checking contents of video frames between 

25 scene changes. 

[0009] Further, in a mechanism for variable speed 
reproduction incorporated in the international standard 
for moving image compression such as MPEG-1 or 
MPEG-2, variable speed reproduction is performed by 

30 manipulating digital video data itself. Thus, a burden 
upon processing is increased in a small-sized device 
with its computer power. In addition, when an attempt is 
made to perform variable speed reproduction in the 
environment such that digital video data distributed 

35 through a network is received, there has been a prob- 
lem that a network traffic is increased. 
[001 0] Accordingly, it is a main object of the present 
invention to provide an image information describing 
method capable of performing retrieving or displaying a 

40 video while checking the content of a video. 

[001 1 ] A related object of the present invention is to 
enable proper video retrieving even when a target frame 
exists between scene changes. 
[0012] A further object of the present invention is to 

45 reduce a throughput in the case of performing variable 
speed reproduction of a video so that the variable speed 
production can be easily achieved by a device with its 
small computer power or on a network. 
[0013] To achieve the foregoing objects, there is 

so provided an image information describing method 
according to the present invention, wherein attribute 
information for specifying a video frame corresponding 
to each of thumbnail frames is described 'as thumbnail 
information concerning the thumbnail frame obtained by 

55 sampling the video frames with arbitrary time interval 
and size. 

[0014] Further, in addition to such attribute informa- 
tion, additional information corresponding to the video 
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frame is described. 

[0015] The attribute information includes either or 
both of position information indicative of a position on a 
time axis of the video frame corresponding to the 
thumbnail frame and size information concerning a size 
of the thumbnail frame. 

[0016] The additional information includes either or 
both of scene change position information on the video 
frame and information on a frame change value 
between the video frames. 

[001 7] The thumbnail information may be described 
together with the thumbnail frame or a pointer for the 
video frame corresponding to the thumbnail frame. 
[0018] In addition, according to the present inven- 
tion, a storage medium is provided in which the thumb- 
nail information only or the thumbnail information with 
the additional information described by the above image 
information describing method is stored together with 
image data of the video frame or separated from the 
image data. 

[0019] Further, according to the present invention, 
the thumbnail information only or the thumbnail informa- 
tion with the additional information described by the 
above mentioned image information describing method 
is employed, making it possible to provide video 
retrieval or video reproduction based on the thumbnail 
frames as described below. 

[0020] That is, according to a first video retrieval 
method/apparatus, at least first positions on a time axis 
of the video frames corresponding to the thumbnail 
frames are described as the thumbnail information con- 
cerning the thumbnail frames obtained by sampling the 
video frames with arbitrary time interval and size, a sec- 
ond position on the time axis of a target video frame is 
specified, and a thumbnail frame having the first posi- 
tion that is the closest to the second position is retrieved 
based on the first positions and the second position. 
[0021] Thus, the thumbnail information described 
according to the present invention is employed, thereby 
making it possible to easily perform the video retrieval of 
a predetermined frame without any burden on a compu- 
ter power or traffic. 

[0022] According to another video retrieval method/ 
apparatus, at least first positions on a time axis of the 
video frames corresponding to the thumbnail frames are 
described as the thumbnail information concerning the 
thumbnail frames obtained by sampling the video 
frames with arbitrary time interval and size, a scene 
change position on the time axis of the video frames is 
further described as additional information, a second 
position on the time axis of a target video frame is spec- 
ified, and a thumbnail frame having the first position that 
is the closest to the second position which is earlier or 
later than the scene change position is retrieved accord- 
ing to a time relationship between the second position 
and the scene change position that is the closest 
thereto, based on the first positions, the second posi- 
tion, and the scene change position. 



[0023] More specifically, the scene change position 
that is the closest to the target frame is detected, it is 
determined as to whether or not the target frame exists 
earlier or later than the scene change position, in the 

5 former case, the video frame that is the closest to the 
target frame and earlier than the scene change position 
is retrieved; and in the latter case, the video frame that 
is the closest to the target frame and later than the 
scene change position is retrieved. 

10 [0024] Thus, the scene change position is 
described as the additional information, thereby making 
it possible to retrieve a thumbnail frame more similar to 
the target frame. 

[0025] According to still another video retrieval 

is method/ apparatus, at least positions on a time axis of 
the video frame corresponding to each of thumbnail 
frames are described as thumbnail information con- 
cerning the thumbnail frames obtained by sampling the 
video frames at arbitrary groups with respect to time 

20 and in spatially arbitrary size, a target image for retrieval 
is specified, and a thumbnail frame which has the differ- 
ence between the target image equal to or less than a 
predetermined threshold is retrieved. In this case, posi- 
tion information described for the thumbnail frame 

25 which has the difference between the target image 
equal to or less than the predetermined threshold may 
be recorded as the retrieval result. 
[0026] Thus, a difference between the target image 
and each of the thumbnail frames, for example, a total of 

so the absolute value differences is obtained, and a thumb- 
nail frame in which this value is minimum is retrieved, 
thereby making it possible to retrieve a predetermined 
frame. 

[0027] According to a video reproducing 

35 method/apparatus, at least positions on a time axis of 
video frames corresponding to thumbnail frames are 
described as thumbnail information concerning the 
thumbnail frames obtained by sampling the video 
frames with arbitrary time intervals and in spatially arbi- 

40 trary size, information on a frame change value between 
two video frames is described as additional information, 
and acquired positions of the thumbnail frames are 
changed according to information on the frame change 
value by employing the thumbnail frames, thereby per- 

45 forming variable speed reproduction of video. 

[0028] That is, a reproduction speed is made slower 
where a frame change value is large, and the reproduc- 
tion speed is made higher where the frame change 
value is small, thereby making it possible to achieve vis- 

so ible variable speed reproduction for the thumbnail 
frames while the frame change value is maintained con- 
stantly. 

[0029] This summary of the invention does not nec- 
essarily describe all necessary features so that the 
55 invention may also be a sub-combination of these 
described features. 

[0030] The invention can be more fully under stood 
from the following detailed description when taken in 
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conjunction with the accompanying drawings, in which: 

FIG. 1 is a view showing a system architecture 
according to one embodiment of the present inven- 
tion; 

FIG. 2 is a conceptual view showing a structure of 
original video data and temporal/spatial thumbnail 
meta-data; 

FIG. 3 is an illustrative view of thumbnail informa- 
tion contained in temporal/spatial thumbnail meta- 
data; 

FIG. 4 is a view showing a management structure 
of the thumbnail information; 
FIG. 5 is a flowchart showing the procedure for 
recording temporal/spatial thumbnail meta-data for 
illustrating the procedure for describing tfie thumb- 
nail information; 

FIG. 6 is a flowchart showing the procedure for 
retrieving the thumbnail using scene change infor- 
mation contained in the temporal/spatial thumbnail 
meta-data; 

FIG. 7 is a flowchart showing the procedure for 
retrieving the thumbnail based on the temporal/spa- 
tial thumbnail meta-data; 

FIG. 8 is a flowchart showing the procedure for a 
variable speed reproduction using the thumbnails; 
FIG. 9 is a flowchart showing the procedure for a 
smooth variable speed reproduction using the 
thumbnails and frame change value information; 
FIG. 10 is a view showing an example of listing 
thumbnails using the scene change information 
contained in the temporal/spatial thumbnail meta- 
data; 

FIG. 1 1 is a view showing an example of displaying 
original video data and thumbnails using the tem- 
poral/ spatial thumbnail meta-data; 
FIG. 12 is a view showing another description 
example of the thumbnail information; 
FIG. 13 is a view showing another description 
example of the thumbnail information: 
FIG. 14 is a view showing still another description 
example of the thumbnail information; 
FIG. 15 is a flowchart showing retrieval of the 
thumbnail data using the thumbnail information 
according to the description examples shown in 
FIGS. 12 and 14; 

FIG. 1 6 is a view showing still another description 
example of the thumbnail information; 
FIG. 1 7 is a view showing a specific example of the 
thumbnail information according to the description 
example shown in FIG. 16; 
FIG. 18 is a flowchart showing an operation display- 
ing the listing of the thumbnail frames variably in 
number according to a display level; 
FIG. 19 is a view showing a change of the thumb- 
nail frame listing when the display level is varied; 
FIG. 20 is a view showing an example when a plu- 
rality of thumbnail frames with different resolutions 



and regions are displayed to be superimposed 
based on the thumbnail information according to 
the description example shown in FIG. 16; and 
FIG. 21 is a view showing another example when a 
5 plurality of thumbnail frames with different resolu- 
tions and regions are displayed to be superimposed 
based on the thumbnail information according to 
the description example shown in FIG. 16. 

10 [0031 ] A preferred embodiment of a video retrieving 
system according to the present invention will now be 
described with reference to the accompanying draw- 
ings. First Embodiment 

[0032] FIG. 1 shows a system architecture accord- 
75 ing to the first embodiment of the present invention. This 
system roughly comprises a database 100, a video dis- 
play engine 104, a thumbnail retrieval/display engine 

105, a controller 106, and a display device 107. The 
content of the database 100 includes three compo- 
se nents; an original video data 101 described later in 

detail, a temporal/spatial thumbnail meta-data 102, and 
a correspondence table 103 having both of these data 
correspond to each other (a correspondence function 
table may be employed.). 

25 [0033] The database 100 may be intensively dis- 
posed at one site or may be disposed to be dispersed at 
a plurality of sites. In short, it is desirable that data can 
be accessed by the video display engine 104 or the 
thumbnail retrieval/display engine 105. The original 

so video data 101 and the temporal/spatial thumbnail 
meta-data 102 may be stored in separate media or may 
be stored in the same medium. As a medium, DVD or 
the like is employed. In addition, the original video data 

101 may be data transmitted via a network without 
35 being stored in one medium. 

[0034] The video display engine 104 performs 
processing for displaying the original video data 101 on 
the display device 107 under the control of the controller 

106. Further, the video display engine 104 performs 
40 processing for displaying a retrieved part of the original 

video data 101 on the display device 1 07 when the orig- 
inal video data 101 is retrieved by the thumbnail 
retrieval/display engine 105 based on the temporal/spa- 
tial thumbnail meta-data 102. 

45 [0035] The thumbnail retrieval/display engine 105 
retrieves proper thumbnail frames in the vicinity of a pre- 
determined frame of the original video data 101 from 
the temporal/spatial thumbnail meta-data 1 02 described 
later in detail under the control of the controller 106, dis- 

50 plays these thumbnail frames as typical frames on the 
display device 107, and performs retrieval of the original 
video data 101 via the controller 106 using the tempo- 
rat/spatial thumbnail meta-data 102. 
[0036] A difference between the thumbnail retrieval/ 

55 display engine 1 05 and the video display engine 1 04 will 
be described. The former processes thumbnail frames 
included in the temporal/spatial thumbnail meta-data 

102 with its small capacitance, and thus, a sufficient 
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processing speed can be obtained even if the engine is 
installed as software on a personal computer with low 
capacity incorporated in a receiving device. 
[0037] The latter processes MPEG-2 video data or 
original video data 101 that is analog video data, and 
thus, it is often required to install special hardware. Spe- 
cifically, when the original video data 101 is video data 
compressed by MPEG-2, a special decode board (a 
MPEG-2 decoder) is employed for the video display 
engine 104. In addition, when the original video data 
101 is analog video data, a video reproduction device 
such as VTR capable of controlling fast forwarding and 
rewinding is employed as the video display engine 104. 
[0038] If the original video data 101 is video data 
compressed by MPEG-1 or MPEG-4, it is possible to 
install the video display engine 104 as software on a 
personal computer, and it is not required to separate it 
as a system architecture. 

[0039] A vertical line connection in the correspond- 
ence table 103 is conceptual, and it is not required for 
the correspondence table 103 to be physically con- 
nected to the original video data 101 and the tempo- 
ral/spatial thumbnail meta-data 102. Therefore, a 
medium having the original video data 101 stored 
therein may be stored in the same mainframe as the 
video display engine 104. In addition, a medium having 
the temporal/spatial thumbnail meta-data 102 stored 
therein may be stored in the same mainframe as the 
thumbnail retrieval/display engine 105. 
[0040] Even if the medium having the temporal/spa- 
tial thumbnail meta-data 102 stored therein and the 
thumbnail retrieval/display engine 105 exist at a position 
distant from each other, a 10 Mbps network with rela- 
tively small transmission capacity, for example will suf- 
fice as a line for connecting both of these medium and 
engine to each other. On the other hand, a line connect- 
ing the medium having the original video data 101 
stored therein and the video display engine 1 04 to each 
other is required to have capacity of 100 Mbps or more 
depending on medium type. 

[0041] A system architecture as shown in FIG. 1 is 
advantageous in that retrieval is based on the tempo- 
ral/spatial thumbnail meta-data 102 with smaller data 
size instead of being based on the original video data 
101, thus making it possible to comfortably perform 
interactive operation and reduce the entire traffic. 
[0042] FIG. 2 is a conceptual view of the original 
video data 101 and the temporal/spatial thumbnail 
meta-data 102. The original video data 101 is digital 
video data or analog data compressed by MPEG-1, 
MPEG-2, MPEG-4 or the like, and includes a group of 
video frames constituting moving images (a video frame 
group). In addition, position information indicative of a 
position on the time axis of each video frame, for exam- 
ple, position information called media time (hereinafter, 
simply referred to as "time") or frame number is associ- 
ated with the original video data 101 . The original video 
data 101 is associated with temporal/spatial thumbnail 



meta-data 1 02 by time or frame number using the corre- 
spondence table 103. 

[0043] The temporal/spatial thumbnail meta-data 
102 includes thumbnail information 201 1 to 201 n . Fur- 

5 ther, in the present embodiment, scene change position 
information 202 and frame change value information 
203 are included in the temporal/spatial thumbnail 
meta-data 1 02 as additional information. 
[0044] The thumbnail information 20^ to 201 n 

10 includes thumbnail frames obtained by sampling video 
frames constituting original video data 101 with arbitrary 
time intervals and in spatially arbitrary size, position 
information (time or frame number) indicative of a posi- 
tion on the time axis of the original video frame corre- 

15 spending to each of the thumbnail frames, and attribute 
information for specifying the thumbnail frames such as 
size information indicative of the size of the thumbnail 
frame. Of these items the of attribute information, the 
former, i.e., position information (time or frame number) 

20 indicative of a position on the time axis of the original 
video frame corresponding to each thumbnail frame is 
described, referring to the correspondence table 1 03. 
[0045] As with digital video data having the original 
video data 101 compressed, when the data has already 

25 been digitized, the thumbnail frames in the thumbnail 
information 201 1 to 201 n of the temporal/spatial thumb- 
nail meta-data 102 are created by decoding or partially 
decoding a predetermined frame of the original video 
data 101. If the original video data 101 is analog data, 

30 thumbnail frames may be created after the analog data 
has been digitized. 

[0046] Now, with respect to a case in which the orig- 
inal video data 1 01 is video data compressed by MPEG- 
2, the former of the attribute information, i.e., position 

35 information (time or frame number) indicative of a posi- 
tion on the time axis of the original video frame corre- 
sponding to each thumbnail frame will be described. In 
this case, the original video data 101 that is video data 
compressed by MPEG-2 is decoded, and the thumbnail 

40 frames 201 -j to 201 n are created by one piece for 30 
frames while the size is reduced by a ratio of 1/8. In 
addition, instead of thus creating the thumbnail frames 
by fixed time sampling and fixed spatial sampling, the 
thumbnail frames can also be created by properly 

45 changing these samplings. Where a frame change 
value is small, it is effective to perform coarse sampling 
in time direction. In addition, where a frame change 
value is large, it is effective to perform fine sampling in 
time direction. 

so [0047] For the video data compressed by MPEG-2, 
the frames compressed by employing only a correlation 
within a frame called I picture (encoded within frame) 
intermittently exist. The I picture is not compressed by 
employing a correlation between frames unlike P picture 

55 (encoded using forward predictive frame) or B picture 
(encoded using bidirectional predictive frames), and 
thus, decoding is easy. Thus, when the thumbnail 
frames are created, only the I pictures of the original 
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video data 101, and moreover, only DC components of 
the DCT (discrete cosine conversion) coefficients of the 
I pictures are decoded, thereby making it possible to 
obtain the temporal/spatial thumbnail frames more eas- 
ily. 5 
[0048] It is not always ensured that the I picture 
exists with certain frame intervals, however, a method 
employing the I picture is effective to create the tempo- 
ral/spatial thumbnail frames from the video data com- 
pressed by MPEG-2 with a speed more than a video io 
rate. 

[0049] A method for creating the thumbnail frames 
from the I picture is small in processing quantity. Thus, 
there is an advantage that, even if special hardware is 
not employed, processing is enabled by only software is 
on a personal computer. In addition, when the thumbnail 
frames are created from the original video data 101 via 
a network, the I picture is employed, thereby making it 
possible to easily avoid an problem such as an 
increased traffic. 20 
[0050] On the other hand, the sampling in spatial 
direction of the original video data 101 when the thumb- 
nail frames are created does not need to be fixed, and 
can be variable as required. Occasionally, the thumbnail 
frames may be expanded for a particularly important 25 
screen frame as well as being reduced. As described 
above, the thumbnail information 201 includes the 
thumbnail frames and the attribute information on the 
thumbnail frames. The attribute information includes 
size information on the thumbnail frames. Thus, the 30 
thumbnail frames can be employed after they have been 
changed to a predetermined size as required during 
retrieval or display. 

[0051] FIG. 3 shows a specific description example 
of the thumbnail information 201 . Trie thumbnail infer- 35 
mation is described by each frame of the thumbnail 
frames. In this example, the information includes: (1) 
frame number or time of original video data correspond- 
ing to the thumbnail frame; (2) size of the thumbnail 
frame (height x width); (3) the number of frames of the 40 
original video data or time until the next thumbnail 
frame; (4) image format of the thumbnail such as JPEG, 
RGB, and YUV; and (5) image data of the thumbnail (or 
pointer for the original video data 101). Here, (3), (4), 
and (5) are not essential, and any of these may be omit- 45 
ted. In addition, additional information other than (1) to 
(5) may be further contained. 

[0052] The thumbnail frames are handled as video 
data having continuous frames with respect to time 
(thumbnail video described later). The video data is so 
compressed into an AVI file or a MPEG-4 file, for exam- 
ple, thereby making it possible to ensure further com- 
pactness. In that case, the video data is directed to a file 
pointer for the video frame of the original video data 101 
and a frame number. Therefore, an interface for acquir- ss 
ing an image of an arbitrary frame from the video data is 
required. 

[0053] FIG. 4 shows a management structure of 
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meta-data 1 02. In this example, a list structure is utilized 
to manage thumbnail frame 201 lf 201 2 . ... 202 n . From 
"root" 401 , lists 402, 403. 404,. „ and 405 to be pointers 
for the thumbnail frame 201 1( 202 2 , ... 202 n are con- 
nected in ascending order of frame numbers, and "end" 
406 is set as a final flag. ID1. ID2, ID3, ... and ID4 of lists 
402, 403, 404, ... and 405 are conceptual, and means 
that these lists 402, 403, 404, ... and 405 are arranged 
in order. In this example, a pointer indicative of where 
actual thumbnail frame 201 1t 201 2 , 202 3 , ... and 202 4 
exist from the lists 402, 403, 404,... and 405 is attached. 
[0054] With such list structure, the thumbnail infor- 
mation can be easily added and deleted. When a new 
thumbnail frame is added, the frame numbers are 
checked in order. Then, the thumbnail information is 
added so as not to reverse the frame numbers in scale. 
When a thumbnail frame is deleted, the corresponding 
thumbnail information may be removed from a list. 
[0055] Thus, the thumbnail information 201 is man- 
aged as a list structure, thereby facilitating addition and 
deletion because the thumbnail frames are considered 
to be not only first determined, but often added later. For 
example, after the I picture of the video data com- 
pressed by MPEG-2 has been registered as a thumb- 
nail frame, there will occur a case in which an attempt is 
made to detect a scene change position of the MPEG-2 
compressed video and register a frame of the scene 
change position as a thumbnail frame. In this case, the 
thumbnail frames from the I pictures described previ- 
ously are registered as a reduced image including only 
a DC component. A thumbnail frame of the scene 
change position is an important frame, and thus, can be 
registered as a full-size image frame. 
[0056] Another description example of thumbnail 
frame will be described later. 

[0057] Now, the specific procedure for a describing 
method of the thumbnail information 201 will be 
described with reference to FIG. 5 by way of exemplify- 
ing a case in which the original video data 101 is video 
data compressed by MPEG-2. FIG. 5 is a flowchart 
showing the procedure for recording the temporal/ spa- 
tial thumbnail meta-data 102 including a description of 
the thumbnail information 201 . 

[0058] First, the video frames of the original video 
data 101 are read (step S11), and the original video 
frames are sampled with respect to time (step S12). A 
scene change position of the original video data is 
detected (step S13). For the scene change position, a 
frame change value between adjacent frames of the 
read original video data 101, for example, is calculated, 
and is detected as a scene change position where a 
change occurs by a certain value or more. 
[0059] Temporal sampling of the original image 
data 101 in step S12 can be done finely in frames 
including a large motion, for example, and can be done 
coarsely in frames including a small motion. In this 
example, the original video data 101 is video data com- 
pressed by MPEG-2, and thus, the I pictures are 
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extracted to create thumbnail frames in step S12, and 
the P pictures are extracted to detect a frame change 
value. 

[0060] Next, the I picture extracted in step S12 is 
sampled spatially, and one thumbnail frame is created 
(Step S14). More specifically, in step S14, the pixels of 
the I picture is decimated, and a thumbnail frame includ- 
ing a reduced image is created. Provided if the I picture 
is an important frame such as scene change position, 
the original video data frame is handled as a thumbnail 
frame without decimation, or the thumbnail frame may 
be occasionally created by performing expansion using 
pixel interpolation. 

[0061] On the other hand, the information on a 
frame change value, namely the information on degree 
of change of an image between the adjacent frames is 
acquired from the P picture extracted in step S1 2 step 

515) . The information on a motion vector from a previ- 
ous frame is added as subsidiary information in the P 
picture, and thus, a frame change value can be obtained 
from the size or distribution of the motion vectors. 
[0062] Next, the thumbnail frames created in step 
S14 are compressed and processed as required (step 

516) ; the compressed thumbnail frames, a scene 
change position detected in step S13, and information 
on the frame change value acquired in step S15 are 
employed, thereby recording the temporal/spatial 
thumbnail m eta -data 102 as shown in FIGS. 2 and 3 
(step S17), and processing terminates. 

[0063] That is, in step S1 7, three items of informa- 
tion, i.e., thumbnail information 201 , scene change posi- 
tion information 202, and frame change value 
information 203 are recorded as temporal/spatial 
thumbnail meta-data 102, as shown in FIG. 2. In addi- 
tion, the thumbnail information 201 , as shown in FIG. 3, 
contains; (1) frame number or time of original video data 
corresponding to the thumbnail frame; (2) size (height x 
width) of the thumbnail frame; (3) the number of frames 
of the original video data or time until the next thumbnail 
frame; (4) image format of the thumbnail such as JPEG, 
RGB, or YUV; and (5) image data of the thumbnail (or 
pointer for the original video data 101). In this example, 
the image data of the thumbnail frame shown in (5) is 
image data of the I picture extracted in step S12; spa- 
tially sampled in step S14; and compressed and proc- 
essed in step S16 as required or not compressed or 
processed. 

[0064] Now, how to use the thus recorded tempo- 
ral/spatial thumbnail meta-data 102 will be described. 

(1) Retrieval of the thumbnail frame employing scene 
change position information 

[0065] In the case where a predetermined video 
frame is displayed, when an attempt is made to directly 
retrieve the predetermined video frame from the original 
video data 101, a long processing time is required as 
described previously. Instead, when a predetermined 



frame is retrieved by retrieving the temporal/spatial 
thumbnail meta-data 102 obtained by sampling original 
video data, whereby a processing time is shortened. 
However, the thumbnail frames are sampled with 

5 respect to time, and thus, a predetermined frame image 
is not always included therein. Thus, it is the easiest way 
to retrieve and display a thumbnail frame that is the 
closest to a predetermined frame with respect to time. In 
FIG. 2, there is shown an example when a thumbnail 

w frame of the thumbnail information 201 n that is the clos- 
est to a predetermined frame indicated by broken line 
with respect to time is defined as a display image frame. 
[0066] In this case, a deviation between the prede- 
termined frame and a display image frame is deter- 

is mined depending on an sampling interval with which the 
thumbnail frames are created. This deviation is small if 
the thumbnail frames are time-sampled with sufficiently 
short intervals, and thus, there is almost no problem. 
However, if a scene change occurs, the thumbnail frame 

20 that is the closest to the predetermined frame with 
respect to time is not always proper as a display image 
frame. That is, if a scene change occurs between the 
predetermined frame and the thumbnail frame con- 
tained in the thumbnail information 201 n that is the clos- 
es est thereto, a thumbnail frame contained in the 
thumbnail information 201 n . 1 immediately before the 
thumbnail information 201 n is more proper as the dis- 
play image frame. According to the present embodi- 
ment, as shown in FIG. 2, scene change position 

30 information 202 is added as additional information to the 
temporal/spatial thumbnail meta-data 102, thereby 
making it possible to solve this problem. 
[0067] Referring now to the flowchart shown in FIG. 
6, the procedure for retrieving a thumbnail frame repre- 

35 sentative of a predetermined frame by employing the 
scene change information 202 as described above will 
be described. Here, the scene change information 202 
is represented by a frame number of a scene change 
position of the original video data (called a scene 

40 change frame number). 

[0068] First, when a frame number of a predeter- 
mined frame to be retrieved is assigned, a scene 
change frame number that is the closest to the frame 
number is retrieved (step S21). 

45 [0069] Next, it is determined whether the predeter- 
mined frame number exists from the start frame number 
of the original video data to the scene change frame 
number retrieved in step S21 (step S22). 
[0070] As a result of determination in step S22, 

so when it is found that the predetermined frame number is 
between the start frame number and the scene change 
frame number, a thumbnail frame that is the closest to 
the predetermined frame number with respect to time 
(or spatially) is retrieved between the start frame 

55 number and the scene change frame number (step 
S23). 

[0071] As a result of determination in step S22, 
when it is not found that the predetermined frame 
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number is not between the start frame number and the 
scene change frame number, a thumbnail frame that is 
the closest to the predetermined frame number with 
respect to time (or spatially) is retrieved between the 
screen change frame number and the last change 
frame number of the original video data (step S24). 
[0072] Then, the retrieved thumbnail frame is dis- 
played as an image that is the most similar to the prede- 
termined frame (step S25), and processing terminates. 

(2) Thumbnail retrieval 

[0073] Referring now to the flowchart shown in FIG. 
7, the procedure for retrieving an image similar to an 
image based on temporal/spatial thumbnail m eta -data 
102 will be described. 

[0074] First, an image R targeted for retrieval, i.e., 
an image to be retrieved is presented (step S31). 
[0075] Next, the thumbnail frames are acquired in 
order one by one from the temporal/spatial thumbnail 
meta-data 102 (step S33). 

[0076] The image R targeted for retrieval is normal- 
ized to size of the thumbnail frame acquired in step S33 
(step S34). This is because the thumbnail frames are 
different from each other in size. 
[0077] The degree of similarity between a thumb- 
nail frame acquired in step S33 and the image R tar- 
geted for retrieval normalized in step S34, for example, 
a total of absolute value differences for each pixel is cal- 
culated (step S35). 

[0078] It is determined whether a total of these 
absolute value differences is equal or less than a prede- 
termined threshold (step S36). As a result of determina- 
tion in step S36, if a total of the absolute value 
differences is equal to or less than the threshold, it is 
determined that the thumbnail frame acquired in step 
S33 is almost identical to the image R targeted for 
retrieval, and the frame number of the thumbnail frame 
is recorded as the result of retrieval (step S37). 
[0079] A series of the above processes is repeated 
until all the thumbnail frames have been obtained in 
step S32, and processing terminates. 
[0080] After processing has terminated in accord- 
ance with the procedure shown in the flowchart of FIG. 
7, the retrieval result is displayed as follows: 
[0081] The retrieved thumbnail frame is displayed 
on the display device 107 by means of the thumbnail 
retrieval/ display engine 105 in FIG. 1, based on the 
frame number of the thumbnail frame obtained as the 
result of retrieval in step S37. 

[0082] Alternatively, when an attempt is made to 
reproduce the original video data 101 at a position of 
the frame number based on the frame number of the 
thumbnail frame obtained as the result of retrieval in 
step S37, the correspondence table 103 shown in FIG. 
1 (or a correspondence function table) is employed, 
thereby to check the frame number of the original video 
data 101 corresponding to the frame number of the 



retrieved thumbnail frame. Then, the frame number 
information and display command are sent to the con- 
troller 106, whereby reproduction is performed from the 
frame of the original video data 101 by employing the 
5 video display engine 104, and the data is displayed on 
the display device 107. 

(3) Variable speed reproduction employing tempo- 
ral/spatial thumbnail metadata 

10 

[0083] As shown in FIG. 2, in the present embodi- 
ment, with respect to the temporal/spatial sampled 
video meta-data 102, the scene position information 
202 and the frame change value information 203 are 
15 described as the additional information other than the 
thumbnail information 201 . 

[0084] The frame change value information 203 is 
information indicative of a frame change value two video 
frames in the original video data 101. For example, 

20 when a total of absolute-value differences between 
frames is employed or when the original video data 101 
is video data compressed by MPEG, an average (an 
average power) of the scale of the motion vector of the 
entire screen can be calculated and obtained from data 

25 on motion compensation between the frames. Such 
frame change value information 203 is added to the 
temporal/spatial sampled video meta-data 102, thereby 
making it possible to perform advanced variable speed 
reproduction. 

30 [0085] As described in a video reproducing appara- 
tus of Japanese Patent KOKAI Publication No. 10- 
243351 (Japanese Patent Application No. 09-042637), 
there is known a technique wherein video is reproduced 
slowly where a screen change is large, and is repro- 

35 duced fast where a screen change is small, thereby 
achieving variable speed reproduction that is easy to 
see by making a frame change value constant. This pat- 
ent assumed that a screen change exists by each 
frame, and all of the frames are employed. Unlike the 

40 present invention, there is not mentioned a case in 
which discrete thumbnail frames with respect to time is 
targeted for processing, and a frame change value can 
also be obtained discretely with respect to time. In the 
present invention, there is provided a method capable of 

45 achieving variable speed reproduction in which a similar 
effect is obtained with respect to the discrete thumbnail 
frames with respect to time and a frame change value. 
[0086] Now, the basic procedures for performing 
variable speed reproduction employing thumbnail 

so frames will be described, referring to the flowchart 
shown in FIG. 8. 

[0087] First, a range of performing variable speed 
reproduction (fast reproduction) is specified (step S41). 
A start frame number of the variable speed reproduction 
55 range is designated by Fs, and an end frame number is 
designated by Fe. 

[0088] Next, a reproduction speed ratio 'm\ is spec- 
ified. That is, it is specified as to how fast the reproduc- 
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tion is performed (step S42). 

[0089] Then, a reproduction direction is specified. 
Namely, it is specified as to whether reproduction is per- 
formed in forward or backward direction (step S43). 
[0090] Further, a reproduction frame rate Y s 
[frames/second] of thumbnail frames is specified (step 
S44). The reproduction frame rate Y differs depending 
on a television system. For example, in the case of 
NTSC, the rate is 30 [frames/second]; and in the case of 
PAL, the rate is 24 [frames/second]. 10 
[0091 ] If a frame rate of the original video data 101 
is R [frames/second], the number of frames to be 
skipped with respect to the thumbnail frames is calcu- 
lated based on the above frame rate on variable speed 
reproduction, as described later (step S45). 15 
[0092] In order to performing thumbnail reproduc- 
tion at a reproduction frame rate of Y [frames/second], 
the thumbnail frames are acquired and displayed at a 
cycle of 1/r seconds (step S46). 

[0093] In the case of forward reproduction, repro- 20 
duction is started from the thumbnail frame number cor- 
responding to the frame *Fs', and the frame numbers 
are skipped in ascending order. In the case of backward 
reproduction, reproduction is started from the thumbnail 
frame corresponding to the frame *Fe\ and the frame 2s 
numbers are skipped in descending order. 
[0094] Hereinafter, the processing in step S46 will 
be described in more detail. In the case of forward 
reproduction, the thumbnail frames are acquired while 
the frame numbers per cycle are increased by 30 
(mxR/r) frames. That is, the (mxR/r) designates 
the number of frames to be skipped in forward direction 
calculated in step S45. In step S46, the closest thumb- 
nail frame in frame number indicative of 
Fs + (mxR / r)xt is reproduced and displayed, wherein 35 
T designates the number of cycles. 
[0095] Similarly, in the case of backward reproduc- 
tion, the thumbnail frames are acquired white the frame 
numbers are decreased by (mxR /r) frames. That is, 
the (mxR/r) designates the number of frames to be 40 
skipped in backward direction calculated in step S45. In 
step S46, the closest thumbnail frame in frame number 
indicative of Fe - (mxR / r)xx t are reproduced and dis- 
played. 

[0096] In this manner, it becomes possible to per- 45 
form variable speed reproduction at an arbitrary repro- 
duction speed ratio employing the thumbnail frames. 
When there is no difference in thumbnail frames to be 
inputted at each cycle, the same frame may be continu- 
ously displayed, thereby making it possible to improve so 
processing efficiency. 

[0097] In the foregoing description, it has been 
assumed that the reproduction speed ratio 'm' is con- 
stant as long as the user change it. Now, a method for 
performing smoother variable speed reproduction will ss 
be described by utilizing the aforementioned frame 
change value information 203. This * variable speed 
reproduction is based on a principle that a reproduction 



speed during variable speed reproduction employing 
the thumbnail frames is changed with time according to 
the frame change value information 203. For the pur- 
pose of clarification of a description, it is considered that 
fast reproduction is performed for the entire original 
video data 101 without particularly specifying a range of 
variable speed reproduction. 
[0098] First, parameters are defined as follows: 

A total number of frames on original video data 101: 
K [frames] 

Frame rate of original video data 101: R 
[frames/second] 

Reproduction frame rate of thumbnail frame: r 

[frames/second] 

Reproduction speed ratio: m 

Frame change value information: Pi (i = 0 n) 

weight to be imparted to reproduction speed corre- 
sponding to thumbnail frame: Wi 
Frame number of original video data corresponding 
to thumbnail frame: Fi (i = 0, .... n-1) weight to be 
imparted to reproduction speed corresponding to 
each frame of original video data: Wj (j = 0, .... K-1) 

[0099] Now, a limit of a frame change value of a 
thumbnail frame imparted to active movement is desig- 
nated by V, and a value that does not exceed a limit V 
is designated by [Pi]. 

[Pi] m L, when Pi > L 

[Pi] = Pi, when Pi =i L (1) 

[01 00] In addition, a weight imparted to a reproduc- 
tion speed corresponding to a frame change value is 
designated by Wi = [Pi] . 

[01 01 ] Next consider a weight f or the reproduction 
speed of each frame. A weight Wi corresponding to the 
discrete reproduction speed is linearly interpolated, and 
the Wj shown below is obtained. 

Wj = Wi + (W(i + 1) - Wi) / (F(i + 1) - Fi)xt (2) 

where t = 0, .... F(i + 1) - Fi , j = Fi,..., F(i + 1) - 1 , i = 0, 
.... n-1 

[01 02] Assuming that Wj is obtained by normalizing 
Wj so that a total summation is 1.0, the following is 
obtained: 

Wj = Wj / EWj where j = 0 k (3) 

[0103] The display count N required for reproduc- 
tion at a reproduction speed ratio 'm' and at a reproduc- 
tion frame rate Y [frames/second] is obtained by the 
formula below. 

N = K/(mxR/r) (4) 
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[0104] When a display image frame is acquired 
from the thumbnail frames in consideration of a weight 
imparted to the reproduction speed, the weight Wj allo- 
cated for each thumbnail frame is added, and the 
thumbnail frame is acquired when the addition value 
exceeds a threshold ofTh-p/N (p = 0, .... N-1). That 
is, the closest thumbnail frame corresponding to the 
frame number when the addition value exceeds the 
threshold Th' becomes a display image frame. 
[0105] If the display image frames are acquired in 
advance according to the above calculation, and are 
displayed at a frame rate of Y [frames/seconds], the 
video is displayed at a slow speed when the quantity is 
great and at a fast speed when the quantity is small. As 
a result, an image can be displayed at a predetermined 
reproduction speed ratio 'm'. When the above calcula- 
tion is employed, it is possible to reproduce a video pro- 
gram of a certain time length within an arbitrary shorter 
time. Smoothing is applied to the weight Wj applied to 
the reproduction speed, and special weighting is per- 
formed during scene change or at a still image portion, 
thereby making rt possible to further add a special effect 
to variable speed reproduction. 
[0106] Here, a case in which variable speed repro- 
duction is performed for the entire original video data 
101 has been described. However, variable speed 
reproduction can be performed in the entire similar 
manner as that in partial reproduction. That is, when Wj 
of the entire original video data 1 01 can be calculated, a 
partial reproduction can be easily performed. In addi- 
tion, in the description of a case of variable speed repro- 
duction for the entire original video data 101, it is 
assumed that frame change value information exists at 
the start and end frames of variable speed reproduction. 
Otherwise, the frame change value information at a 
properly close frame is used or a default value is given, 
thereby performing calculation. 
[0107] Hereinafter, referring now to the flowchart 
shown in FIG. 9, a specific processing procedure for 
performing smoother variable speed reproduction will 
be described by utilizing the frame change value infor- 
mation 203 as described above, in FIG. 9, the process- 
ing in steps S51 to S54 is basically similar to that in 
steps S41 to S44 shown in FIG. 8. 
[0108] That is, a range for variable speed reproduc- 
tion (rapid reproduction in this case) with the constant 
frame change value is specified (step S51). The start 
frame of the variable speed reproduction range is 
defined as 'Fs\ and the end frame is defined as 'Fe\ 
Next, a reproduction speed ratio "m" is specified. That 
is, it is specified as to how fast the reproduction is per- 
formed (step S52). Next, a reproduction direction is 
specified. Namely, it is specified as to whether fast 
reproduction is performed in forward or backward direc- 
tion (step S53). Then, a reproduction frame rate Y 
[frames / second] of the thumbnail frames is specified 
(step S54). 

[0109] Thereafter, the required display count N is 



calculated by the formula (4) (step S55). A position of 
the thumbnail frame when an addition value of Wj 
shown in the formula (4) exceeds a threshold value of 
Th = p / N (p o o t N-1), namely, the closest thumb- 

5 nail frame corresponding to the frame number when the 
addition value exceeds the threshold Th* is calculated 
as a display image frame position, and the calculated 
position is recorded in a table (step S56). 
[01 1 0] In order to reproduce and display thumbnail 

1 o frames at a reproduction frame rate 'r* [frames / second] , 
a display thumbnail frame is acquired and displayed by 
employing the above table at a cycle of 1/r second (step 
S57). 

[01 1 1 ] When the thumbnail frame is thus employed 
is to perform variable speed reproduction, a reproduction 
speed is changed according to the frame change value. 
Namely, the reproduction speed is made slow where the 
frame change value is great and is made fast where the 
frame change value is small, whereby variable speed 
20 reproduction in which the frame change value is con- 
stant similar to "a video reproducing apparatus" dis- 
closed in Japanese Patent KOKAI Publication No. 10- 
243351 (Japanese Patent Application No. 09-042637) 
can be achieved for the thumbnail frame. 

25 

(4) Other application aspect 

[0112] FIG. 10 is an example listing the closest 
thumbnail frames 501, 502,... to a scene change posi- 
30 ton (cut point) selected by a method described above. 
Such listing screen 500 can be created at a high speed 
because an image frame is not inputted from the origi- 
nal image data. 

[0113] FIG. 11 is an example in which the entire 
35 original video is displayed by one bar 601 , and further, 
is displayed by a bar 602 in which a specified range of 
the bar 601 is enlarged. At the enlarged bar 602, an 
image of a frame of the cut point included in the original 
video at this range is displayed as heading. When a 
40 mouse cursor 603 is applied onto the enlarged bar 602, 
the most similar close thumbnail frame 604 to the image 
frame at the mouse cursor 603 is selected in considera- 
tion of a position of the cut point, and can be displayed 
as an icon. Since this processing can be performed at a 
45 high speed, a mouse icon is horizontally slid, thereby 
making it possible to display an icon image in real time 
as rf rt were a moving image. 

[0114] On the other hand, when application of a 
monitoring system is considered, there is a request for 

so efficiently finding out a less frequent event. For exam- 
ple, only a background image is always displayed on the 
monitoring screen. Assume that an invader is displayed 
at a certain time. Such invader can be easily found as a 
difference image of the background image. In addition, 

55 the video is recorded, and at the same time, a thumbnail 
frame is sampled coarsely with regard to .time where no 
change occurs on the screen, and is time-sampled 
finely with regard to time where a change occurs on the 
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screen, thereby making it possible to reliably record the 
invader. Information for management of cut points or the 
like is stored as the additional information on the screen 
on which the invader is displayed, making it possible to 
display a list later. In addition, only when the invader is 5 
found, spatial sampling of a thumbnail frame is fined, 
thereby making it possible to check an invader even in 
the thumbnail frames. 

[0115] Further, it is effective to acquire a still image 
with its higher resolution than an original video when the 10 
invader has invaded, and to manage it as the thumbnail 
frame. When the resolution is insufficient in a general 
video, it is possible to discriminate the invader by 
employing a still image with its higher resolution than 
the original image. is 
[0116] As has been described above, according to 
the present embodiment, thumbnail information includ- 
ing the thumbnail frames obtained by sampling the orig- 
inal video frames other than the original video data with 
arbitrary time intervals and in an arbitrary spatial size 20 
and the attribute information is recorded in advance, 
» and the thumbnail information other than the original 

video data is retrieved, thereby making it possible to 
easily perform video retrieval for a predetermined frame 
without any burden on a computer power or traffic. In 25 
addition, the scene change position information is 
added to the thumbnail information as the additional 
information, thereby making it possible to retrieve a 
thumbnail frame more similar to the predetermined 
frame. Further, a difference between a predetermined so 
image targeted for retrieval and the image of each 
thumbnail frame, for example, a total of absolute value 
differences is obtained, and a thumbnail frame whose 
total of absolute value differences is small is retrieved, 
thereby making it possible to retrieve the predetermined 35 
image. Furthermore, the reproduction speed is made 
slow, where a frame change value is great, and the 
reproduction speed is made fast where the frame 
change value is small, thereby making it possible to 
achieve a variable speed reproduction for the thumbnail 40 
frame which is easy to see and has the frame change 
value constantly maintained. 

[0117] Other embodiments of the video retrieval 
system according to the present invention will be 
described. The same portions as those of the first 45 
embodiment will be indicated in the same reference 
numerals and their detailed description will be omitted. 

Second Embodiment 

50 

[01 1 8] In the first embodiment, the temporal/spatial 
thumbnail meta-data 102 is assumed to have a plurality 
of thumbnail information 201 1 to 201 n . A description 
example thereof was not described in detail. The sec- 
ond embodiment concerning this specific description ss 
example will be described below. 
[0119] FIG. 12 shows a description example of 
thumbnail information of the second embodiment. In the 



figure, a group of the thumbnail frames are handled as 
one video (thumbnail video), and thumbnail video infor- 
mation 701 is configured as a set of the thumbnail infor- 
mation. A thumbnail video other than the thumbnail 
video information 701 is provided, its site may be 
described in the thumbnail video information 701 by 
URL or the like, and the thumbnail video may be 
described directly as the thumbnail video information 
701. 

[01 20] Thumbnail information 702 indicates a corre- 
spondence between the thumbnail frame in the thumb- 
nail video indicated by the thumbnail video information 
701 and the original video data frame, and is described 
in plurality according to the number of thumbnail frames 
contained in the thumbnail video. The thumbnail infor- 
mation 702 includes a media time 703 of the original 
video frame and a media time 704 of the thumbnail 
video. The media time 703 of the original video frame 
indicates the original video frame corresponding to the 
thumbnail frame. If the original video frame can be 
uniquely determined, it may be time such as a time 
stamp or a frame number or the like. In addition, in the 
case where a corresponding original video frame is 
obtained by calculation, for example, in the case where 
original video frames are sampled with constant inter- 
vals, information (for example, sampling intervals) 
required for calculation is described, whereby the media 
time 703 of the original video frame may be omitted. 
The media time 704 of the thumbnail video indicates a 
specific thumbnail frame in the thumbnail video indi- 
cated by the thumbnail video information 701. If the 
thumbnail frame can be uniquely determined, the media 
time 704 of the thumbnail may be a frame number or the 
like. If the thumbnail video is handled as a general 
video, it may be a time such as time stamp. In addition, 
when correspondence with the thumbnail video is per- 
formed sequentially, it may be omitted. 
[0121] FIG. 13 shows another description example 
of thumbnail information. Thumbnail information 801 
presents a correspondence between each thumbnail 
frame and the original video data frame, and is 
described in plurality according to the number of thumb- 
nail frames. The thumbnail information 801 includes a 
media time 802 of the original video frame and thumb- 
nail data 803. The media time 802 indicates a frame 
position of the original video data corresponding to the 
thumbnail frame, similar to the media time 703 in the 
description example shown in FIG. 12. This media time 
802 may be omitted in a manner similar to that in the 
media time 703. Thumbnail frames other than thumbnail 
data 801 are individually provided, whereby its site may 
be described by URL or the like, and the thumbnail 
frames are directly described as thumbnail data in the 
thumbnail data 801. In addition, instead of the thumb- 
nail, another image such as illustration indicative of its 
content may be employed as thumbnail data. 
[0122] -FIG. 14 shows another description example 
of thumbnail information. Such another description 
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example shown in FIG. 14 includes both of the descrip- 
tion examples shown in FIGS. 12 and 13. Thumbnail 
video information 901 is similar to the thumbnail video 
information 701 in the description example shown in 
FIG. 12, and denotes URL indicating a site of this video 
or a thumbnail video itself. Thumbnail information 902 
presents a correspondence between each thumbnail 
frame and the original video data frame, and is 
described in plurality according to the number of thumb- 
nail frames. The thumbnail information 902 includes a 
media time 903 of the original video frame and either of 
the media time 904A or thumbnail data 904B of the 
thumbnail video. The media time 903 of the original 
video frame indicates a frame of the original video data 
corresponding to the thumbnail frame, similar to the 
media time 703 in the description example shown in 
FIG. 12. This media time 903 may be omitted in a man- 
ner similar to that in the media time 703. A media time 
904 A of the thumbnail video is similar to the media time 
704 in the description example shown in FIG. 12, and 
indicates a specific thumbnail frame in the thumbnail 
video indicated in the thumbnail information 901 . If the 
media time 904A is sequentially associated with the 
thumbnails, it may be omitted. Thumbnail data 904B is 
similar to thumbnail data 803 in the description example 
shown in FIG. 13, and indicates sites of the individual 
thumbnail frames or a thumbnail frame itself. 
[0123] According to the description example shown 
in FIG. 14, a part of the thumbnail video can be replaced 
with another, or another thumbnail can be added. 
[0124] Now, processing for extracting thumbnail 
data of a predetermined media time will be described by 
referring to the description examples shown in FIGS. 12 
to 14. FIG. 15 is a basic flowchart thereof. In step $61, 
a predetermined media time of the original video frame 
corresponding to a predetermined thumbnail frame is 
inputted. The media time uniquely indicates a time- 
related position in the media such as time stamp or 
frame number. In step S62, first thumbnail information is 
inputted from among the thumbnail information groups 
described in the description examples shown in FIGS. 
12 to 14. In step S63, the predetermined media time is 
compared with a media time of the original video frame 
contained in the thumbnail information. If both of them 
are identical or the predetermined media time is later, 
the processing goes to step S64, and the thumbnail 
data indicated in the thumbnail information is inputted. A 
thumbnail data extraction method differs depending on 
a describing method. When a thumbnail frame number 
is described, the corresponding thumbnail data of the 
thumbnail video is extracted. When the thumbnail data 
itself is described, the data is employed as is. When a 
media time contained in the thumbnail information is 
later than the predetermined media time, the process- 
ing goes to step S65. Then, next thumbnail information 
is inputted from the thumbnail information group, the 
processing goes to step S63 again, and media time 
comparison is performed. 



[0125] FIG. 16 is a description example when the 
attribute information of a thumbnail frame is added to 
the description examples shown in FIGS. 12 to 14. A 
thumbnail video can be employed as a thumbnail by 

5 employing thumbnail frames of its difference size or cut- 
ting out only a region of part of the original video data. 
Thus, the description example shown in FIG. 16 is 
directed to an example of describing these parameters 
as attribute information. 

io [0126] Thumbnail group information 1001 indicates 
information in accordance with a description example or 
the like shown in FIGS. 12 to 14. Thumbnail attribute 
information 1002 is directed to attribute information of 
individual thumbnail frames, and is described in plurality 

15 according to the number of thumbnail frames contained 
in a thumbnail video. The thumbnail attribute informa- 
tion 1002 includes thumbnail number 1003, resolution 
information 1004, and region information 1005. 
[0127] The thumbnail number 1003 is a number 

20 corresponding to a specific thumbnail frame contained 
in the thumbnail frame group indicated in the thumbnail 
group information 1001. If the thumbnail frame number 
1003 sequentially corresponds to the thumbnail frame 
in the thumbnail frame group, it may be omitted. 

25 [0128] The resolution information 1004 indicates 
resolution of the original image data corresponding to 
the thumbnail frame indicated by the thumbnail number 
1003. For example, a reduction rate of the image or the 
like is described. 

30 [0129] The region information 1005 indicates the 
region in a frame of the original video data correspond- 
ing to the thumbnail frame indicated by the thumbnail 
number 1003. When the thumbnail frame cuts out a part 
of the corresponding frame of the original video data. 

35 that region is described as is. When a thumbnail frame 
is equivalent to the whole corresponding frame of the 
original video data, the region information may be omit- 
ted. 

[0130] Although not shown here, these items of 
40 attribute information may be described in each thumb- 
nail information in description examples shown in FIGS. 
12to 14. 

[0131] FIG. 17 is an actual description example 
when the describing method shown in FIG. 16 is 

45 employed. Assume that an object exists in a part of the 
original video frame 1401. When a thumbnail frame of 
the original video frame 1401 is created, the thumbnail 
frame containing more detailed contents of an image 
can be created by using only a part of the screen and 

so sampling it, rather than sampling the entire screen. A 
rectangular region 1402 in the original video frame 1401 
is selected, sampling is performed so that the height 
and width are reduced by 1/2, and a thumbnail frame 
1403 is created. At this time, a description example of 

55 resolution information and region information is repre- 
sented by 1404. 

[0132] FIG. 18 is a basic flowchart when thumbnail 
frames are listed according to the user request. In step 
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S71 , the user inputs a listing level or display level In this 
inputting method, GUI such as slider which continuously 
changes according to the display level may be 
employed, and numeric data may be directly inputted. 
Alternatively, an input device such as wheel or dial con- s 
nected to a computer or the like may be employed. 
[0133] In step S72, the number of thumbnail frames 
to be listed from the level value inputted in step S71 is 
calculated. For example, assume that a maximum dis- 
play level is designated by Lmax, the maximum number io 
of display thumbnail frames is designated by Tmax, and 
the current display level is designated by L, the number 
of display thumbnail frames can be obtained by 
T = Tmax x L / Lmax . 

[0134] In step S73, thumbnail frames to be listed is 
are selected according to the number of display thumb- 
nail frames. For example, a thumbnail frame is selected 
with constant time intervals or constant frame intervals. 
Alternatively, when additional information such as cut 
point information is provided, a frame with its higher pri- 20 
orrty such as a first frame of cut point or scene may be 
preferentially selected. 

[0135] In step S74, a list of selected thumbnail 
frames is created and displayed. 

[0136] FIG. 19 shows an interface of the listing of 25 
thumbnail frames by employing the basic flowchart 
shown in FIG. 18. A slider 1 102 for specifying a display 
level and a thumbnail list 1 103 exist on a thumbnail list 
1101. When the slider 1102 is moved to a position as 
indicated by a slider 11 05 to increase a display level, the so 
number of thumbnail frames to be listed increases as 
shown in a listing 1106. By employing such interface, 
the user can display necessary thumbnail frames intui- 
tively according to the contents of video. 
[0137] FIG. 20 is an example of screen display 35 
employing a description example shown in FIG. 16. By 
employing the description example shown in FIG. 16, a 
thumbnail frame with its different resolution or a thumb- 
nail frame having only a part of the screen cut out can 
be handled. On the other hand, a region in which sam- <o 
pling with high resolution is desirable, such as subtitle 
portion and a region in which sampling with low resolu- 
tion suffices, such as background coexist in an image. A 
group of thumbnail frames 1201 including plural thumb- 
nail frames with different resolutions and regions ere- 45 
ated from the same frame is provided, and these 
thumbnail frames are displayed to be superimposed as 
shown in a screen display example 1202, thereby mak- 
ing it possible to display a subtitle with high resolution 
and a background with a low resolution. so 
[0138] FIG. 21 is another example of screen display 
employing the description example shown in FIG. 16. 
An image 1301 is a thumbnail frame sampled at a low 
resolution. When a region 1302 in which the user 
desires a more detailed image, such as subtitle portion, 55 
is pointed out by the mouse or the like, a thumbnail 
frame 1303 in which only a region 1302 is sampled at a 
higher resolution is displayed by pop-up or the like. In 



general, a thumbnail frame with its low resolution, such 
as image 1301 is displayed. Thus, the size of image can 
be reduced, and many images can be displayed by list- 
ing them or the like. 

[0139] The present invention is not limited to the 
above mentioned embodiments, and can be practiced 
by modifying it variously. 

[0140] As has been described above, according to 
the image information describing method of the present 
invention, the contents of video can be retrieved or dis- 
played while they are confirmed. 
[0141] In addition, when retrieval is performed 
based on a thumbnail obtained by sampling original 
video data, even if a target frame for retrieval exists 
between a scene change and another scene change, 
proper video retrieval can be performed. 
[0142] Further, variable speed reproduction can be 
performed based on a thumbnail. Thus, a processing 
quantity can be reduced, and variable speed reproduc- 
tion can be easily achieved even on a device with its 
small computer power or on a network. 

Claims 

1 . An image information describing method character- 
ized by comprising: 

sampling a plurality of thumbnail frames from 
video information including a plurality of video 
frames at arbitrary time interval and size; and 
describing attribute information for specifying 
the video frame corresponding to each of the 
thumbnail frames as thumbnail information. 

2. The image information describing method accord- 
ing to claim 1, characterized by further comprising 
describing additional information contains scene 
change position information of the video informa- 
tion. 

3. The image information describing method accord- 
ing to claim 1. characterized by further comprising 
additional information contains frame change value 
information of the video information. 

4. The image information describing method accord- 
ing to claim 1, characterized in that the attribute 
information contains position information indicative 
of a position on a time axis of the video frame cor- 
responding to the thumbnail frame. 

5. The image information describing method accord- 
ing to claim 1, characterized in that the attribute 
information contains information concerning the 
size of the thumbnail frame. 

6. The image information describing method accord- 
ing to claim 1, characterized in that the attribute 
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information contains information concerning the 
resolution of the thumbnail frame. 

7. The image information describing method accord- 
ing to claim 1 , characterized in that the thumbnail s 
information contains image data of the thumbnail 
frame or a pointer for the thumbnail frame. 

8. The image information describing method accord- 
ing to claim 1, characterized in that the plurality of 10 
thumbnail frames are stored as one item of the 
thumbnail information. 

9. A video retrieval method for retrieving video infor- 
mation including a plurality of video frames by 75 
employing thumbnail information concerning a plu- 
rality of thumbnail frames obtained by sampling the 
video information with arbitrary time interval and 
size, the video retrieval method characterized by 
comprising: 20 

describing, as the thumbnail information, 
attribute information containing at least first 
position information indicative of a position on a 
time axis in order to specify the video frame 25 
corresponding to each of the thumbnail frames; 
and 

retrieving the thumbnail frame having the clos- 
est first position information to a second posi- 
tion information indicative of a position on the so 
time axis of a desired video frame of the prede- 
termined video information. 

10. The video retrieval method according to claim 9, 
characterized in that the thumbnail frames contain 35 
a frame obtained by sampling only an arbitrary part 

of one frame of the video information with arbitrary 
time interval and size. 

11. The video retrieval method according to claim 9, 40 
characterized in that the plurality of thumbnail 
frames are stored as one Kern of the thumbnail 
information. 



change position information of the video infor- 
mation; and 

retrieving a thumbnail frame having the closest 
first position information to a second position 
information indicative of a position on the time 
axis of a desired video information and earlier 
or later than the scene change position infor- 
mation. 

13. The video retrieval method according to claim 12, 
characterized in that the thumbnail frames contain 
a frame obtained by sampling only an arbitrary part 
of one frame of the video information with arbitrary 
time interval and size. 

14. The video retrieval method according to claim 12, 
characterized in that the plurality of thumbnail 
frames are stored as one item of the thumbnail 
information. 

15. A video retrieval method for retrieving video infor- 
mation including a plurality of video frames by 
employing thumbnail information concerning a plu- 
rality of thumbnail frames obtained by sampling the 
video information with arbitrary time interval and 
size, the video retrieval method characterized by 
comprising: 

describing, as the thumbnail information, 
attribute information containing at least position 
information indicative of a position on a time 
axis in order to specify the video frame corre- 
sponding to each of the thumbnail frames; and 
retrieving a thumbnail frame in which difference 
from a desired video information is equal to or 
less than a predetermined threshold. 

16. The video retrieval method according to claim 15, 
characterized in that the position information 
described for a thumbnail frame in which the differ- 
ence from the desired video information is equal to 
or less than the predetermined threshold is 
recorded as the retrieval result. 



12. A video retrieval method for retrieving video infor- 45 
mation including a plurality of video frames by 
employing thumbnail information concerning a plu- 
rality of thumbnail frames obtained by sampling 
video information with arbitrary time interval and 
size, the video retrieval method characterized by so 
comprising: 

describing, as the sample image information, 
attribute information containing at least first 
position information indicative of a position on a ss 
time axis in order to specify the video frame 
corresponding to each of the thumbnail frames; 
describing, as additional information, scene 



17. The video retrieval method according to claim 16, 
characterized in that the thumbnail frames contain 
a frame obtained by sampling only an arbitrary part 
of one frame of the video information with arbitrary 
time interval and size. 

18. The video retrieval method according to claim 16. 
characterized in that the plurality of thumbnail 
frames are stored as one item of the thumbnail 
information. 

19. A video reproducing method for reproducing video 
information including a plurality of video frames at 
variable speed by employing thumbnail information 
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concerning a plurality of thumbnail frames obtained 
by sampling the video information with arbitrary 
time interval and size, the video reproducing 
method characterized by comprising: 

5 

describing, as the thumbnail information, 
attribute information containing the thumbnail 
frames and at least position information indica- 
tive of a position on a time axis in order to spec- 
ify the video frames corresponding to the w 
thumbnail frames; 

describing frame change value information of 
the video information as additional information; 
and 

changing a reproduction speed of the thumb- 15 
nait frames according to the frame change 
value information. 

20. The video reproducing method according to claim 

19, characterized in that the thumbnail frames con- 20 
tain a frame obtained by sampling only an arbitrary 
part of one frame of the video information with arbi- 
trary time interval and size. 

21. The video reproducing method according to claim 25 
19, characterized in that the plurality of thumbnail 
frames are stored as one item of the thumbnail 
information. 

22. A video retrieval apparatus for retrieving video infor- 30 
mation including a plurality of video frames by 
employing thumbnail information concerning a plu- 
rality of thumbnail frames obtained by sampling the 
video information with arbitrary time interval and 
size, the video retrieval apparatus characterized by 35 
comprising: 

a first describing unit (102) configured to 
describe, as the thumbnail information, 
attribute information containing at least first <o 
position information indicative of a position on a 
time axis in order to specify the video frame 
corresponding to each of the thumbnail frames; 
a second describing unit (102) configured to 
describe, as additional information, scene 45 
change position information of the video infor- 
mation; and 

a retrieving unit (105) configured to retrieve a 
thumbnail frame having the closest first posi- 
tion information to a second position informs- so 
tion indicative of a position on the time axis of a 
desired video information and earlier or later 
than the scene change position information. 

23. The video retrieval apparatus according to claim 22, ss 
characterized in that the thumbnail frames contain 

a frame obtained by sampling only an arbitrary part 
of one frame of the video information with arbitrary 



time interval and size. 

24. Trie video retrieval apparatus according to claim 22, 
characterized in that the plurality of thumbnail 
frames are stored as one item of the thumbnail 
information. 

25. A video retrieval apparatus for retrieving video infor- 
mation including a plurality of video frames by 
employing thumbnail information concerning a plu- 
rality of thumbnail frames obtained by sampling the 
video information with arbitrary time interval and 
size, the video retrieval apparatus characterized by 
comprising: 

a describing unit (102) configured to describe, 
as the thumbnail information, attribute informa- 
tion containing at least position information 
indicative of a position on a time axis in order to 
specify the video frame corresponding to each 
of the thumbnail frames; and 
a retrieving unit (105) configured to retrieve a 
thumbnail frame in which difference from a 
desired video information is equal to or less 
than a predetermined threshold. 

26. The video retrieval apparatus according to claim 25, 
characterized in that the thumbnail frames contain 
a frame obtained by sampling only an arbitrary part 
of one frame of the video information with arbitrary 
time interval and size. 

27. The video retrieval apparatus according to claim 25, 
characterized in that the plurality of thumbnail 
frames are stored as one item of the thumbnail 
information. 

28. A video reproducing apparatus for reproducing 
video information including a plurality of video 
frames at variable speed by employing thumbnail 
information concerning a plurality of thumbnail 
frames obtained by sampling the video information 
with arbitrary time interval and size, the video repro- 
ducing apparatus characterized by comprising: 

a first describing unit configured to describe, as 
the thumbnail information, attribute information 
containing the thumbnail frames and at least 
position information indicative of a position on a 
time axis in order to specify the video frame 
corresponding to each of the thumbnail frames; 
a second describing unit configured to describe 
frame change value information of the video 
information in the thumbnail information as 
additional information; and 
a changing unit configured to change a repro- 
duction speed of the thumbnail frames accord- 
ing to the frame change value information. 
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29. The video reproducing apparatus according to 
claim 28, characterized in that the thumbnail frames 
contain a frame obtained by sampling only an arbi- 
trary part of one frame of the video information with 
arbitrary time interval and size. 5 

30. The video reproducing apparatus according to 
claim 28, characterized in that the plurality of 
thumbnail frames are stored as one item of the 
thumbnail information. 10 
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